HW 3 Drafting Viz

Author

Ian Morris-Sibaja

Published

March 4, 2025

Can’t spell Disaster without… Pisaster??

Pisaster are keystone species in the intertidal zone. They are a type of sea star that are known to be voracious predators of mussels. When Pisaster are present, they can control the population of mussels, which in turn can have cascading effects on the rest of the ecosystem.

1. Which option do you plan to pursue? It’s okay if this has changed since HW #1.

I plan to remain pursuing option 1.

2. Restate your question(s). Has this changed at all since HW #1? If yes, how so?

The overall question of: How have Pisaster, a Sea Star Genus, been affected by changing environments over the past 20 years? My three subquestions include: How has the abundance of Pisaster species changed over time in California?, How has the distribution of Pisaster species changed over time in California?, and How has the presence of Pisaster species changed over time in California?

This has been a more specified question since HW #1. I have decided to focus on Pisaster species specifically, as they are a keystone species in the intertidal zone, rather than all coastal species.

3. Explain which variables from your data set(s) you will use to answer your question(s), and how.

I plan to clean my data to species, year, longitude, and latitude. I will then group by and summarise totals to get a count of each species by year and bin both year and latitude. Then I will plot these to visualize their changes.

4. Inspirational Visualizations and explain which elements you might borrow

WE MUST PROTECT MARINE ENVIRONMENTS

I like how the colors are seperating the brochure in 3rds. It is pleasing to the eyes and will work well with three subquestions.

WE MUST PROTECT MARINE ENVIRONMENTS

I like how the colors fill up the animal outline, I think it can help me portay abundance wihin this context.

5. Hand-draw your anticipated visualizations

Mock Up

Set Up

Data Import

# Read in three excel files from MARINe biodiversity data 
point_contact_raw <- read_excel(here('data', 'MARINe_biodiversity_data',
                                     'cbs_data_CA_2023.xlsx'), sheet = 'point_contact_summary_data')
quadrat_raw <- read_excel(here('data', 'MARINe_biodiversity_data',
                               'cbs_data_CA_2023.xlsx'), sheet = 'quadrat_summary_data')
swath_raw <- read_excel(here('data', 'MARINe_biodiversity_data',
                             'cbs_data_CA_2023.xlsx'), sheet = 'swath_summary_data')

# Read in Dangermond preserve shape file 
dangermond <- read_sf(here('data', 'dangermond_shapefile', 'jldp_boundary.shp'))

# Read in California state boundary 
california <- spData::us_states %>% 
  filter(NAME == "California")

Data Cleaning

Lots of cleaning based on our capston project

# Clean point_contact dataset 
point_contact_clean <- point_contact_raw %>% 
  # Remove non-matching columns 
  select(!c('number_of_transect_locations', 'percent_cover')) %>% 
  # Rename num of hits to total count 
  rename(total_count = number_of_hits) %>% 
  # Create new data collection source column 
  mutate(collection_source = "point contact") %>% 
  # Filter to mainland only 
  filter(island == "Mainland") %>% 
  # Remove certain species lumps 
  filter(!species_lump %in% c("Rock", "Sand", "Tar", "Blue Green Algae", "Red Crust", "Diatom", "Ceramiales"))

# Clean quadrat dataset 
quadrat_clean <- quadrat_raw %>% 
  # Remove non-matching columns 
  select(!c('number_of_quadrats_sampled', 'total_area_sampled_m2', 'density_per_m2')) %>% 
  # Create new data collection source column 
  mutate(collection_source = "quadrat") %>% 
  # Filter to mainland only 
  filter(island == "Mainland") %>% 
  # Remove certain species lumps 
  filter(!species_lump %in% c("Rock", "Sand", "Tar", "Blue Green Algae", "Red Crust", "Diatom", "Ceramiales"))

# Clean swath dataset 
swath_clean <- swath_raw %>% 
  # Remove non-matching columns 
  select(!c('number_of_transects_sampled', 'est_swath_area_searched_m2',  'density_per_m2')) %>% 
  # Create new data collection source column 
  mutate(collection_source = "swath") %>% 
  # Filter to mainland only 
  filter(island == "Mainland") %>% 
  # Remove certain species lumps 
  filter(!species_lump %in% c("Rock", "Sand", "Tar", "Blue Green Algae", "Red Crust", "Diatom", "Ceramiales"))

Merge datasets

Merge data sets for easy calculations

# Merge the 3 dataset together 
biodiv_merge <- bind_rows(point_contact_clean, quadrat_clean, swath_clean) %>% 
  filter(year < 2021)
# Group by site and species (no year)
genus_sum <- biodiv_merge %>% 
  mutate(genus = word(species_lump)) %>% 
  group_by(genus, species_lump, year, longitude, latitude) %>% 
  summarise(num_count = sum(total_count)) %>% 
  # Create column to indicate presence/absence
  mutate(presence = ifelse(num_count >= 1, 1, 0)) 

California Data

# Convert to WGS84 to lat long
california <- st_transform(california, crs = 4326)

Convert biodiv data to sf object

genus_sum_geo <- genus_sum %>% 
  filter(presence == 1) %>%  
  st_as_sf(coords = c("longitude", "latitude"), crs = st_crs(california), remove = FALSE)

# Check that the crs matches 
if(st_crs(california) == st_crs(genus_sum_geo)) {
  print("The coordinate reference systems match")
} else {
  print("The coordinate reference systems do NOT match. Transformation of CRS is recommended.")
}
[1] "The coordinate reference systems match"

Filter Genus

genus_count <- genus_sum %>% 
  group_by(genus) %>% 
  summarise(
    total_count = sum(num_count)
  ) %>% 
  arrange(desc(total_count)) %>% 
  slice_max(order_by = total_count, n = 10) %>% 
  arrange(total_count)

Pisaster Abundance Plot

pisaster_sum <- biodiv_merge %>% 
  mutate(genus = word(species_lump)) %>% 
  filter(genus == "Pisaster") %>%
  group_by(species_lump, year, longitude, latitude) %>% 
  summarise(num_count = sum(total_count)) %>% 
  # Create column to indicate presence/absence
  mutate(presence = ifelse(num_count >= 1, 1, 0))  
pisaster_sum_year <- pisaster_sum %>% 
  group_by(year, species_lump) %>% 
  summarise(
    total_count = sum(num_count)
  ) %>% 
  arrange(year)
pisaster_sum_geo <- pisaster_sum %>% 
  filter(presence == 1) %>%  
  st_as_sf(coords = c("longitude", "latitude"), crs = st_crs(california), remove = FALSE)
theme_ocean <- theme_minimal(base_size = 14) +
  theme(
    panel.background = element_rect(fill = "#b3e5fc", color = NA),  # Light ocean blue background
    panel.grid.major = element_line(color = "#b3e5fc", linetype = 0),  # Soft wave-like grid
    panel.grid.minor = element_blank(),
    legend.background = element_rect(fill = "#ffffff", color = NA),
    legend.position = "right",
    axis.text = element_text(size = 12, color = "#01579b"),  # Deep ocean blue text
    axis.title = element_text(size = 14, face = "bold", color = "#004d40"),
    plot.title = element_text(size = 18, face = "bold", color = "#004d40", hjust = 0.5),
    plot.subtitle = element_text(size = 14, color = "#004d40", hjust = 0.5)
  )
pisaster_sum_perc <- pisaster_sum_year 

pisaster_sum_perc$year_bin <- cut(pisaster_sum_perc$year, 
                                  breaks = c(2001, 2005, 2009, 2013, 2017, 2021),  
                                  include.lowest = TRUE, 
                                  right = FALSE,
                                  labels = c("2001-2004",
                                             "2005-2008",
                                             "2009-2012",
                                             "2013-2016",
                                             "2017-2020"))

# Compute total_count_sum first using summarise()
pisaster_sum_perc <- pisaster_sum_perc %>%
  group_by(species_lump, year_bin) %>%
  summarise(total_count_sum = sum(total_count, na.rm = TRUE), .groups = "drop") 

# Plot with corrected normalized count
ggplot(pisaster_sum_perc, aes(x = year_bin, y = total_count_sum, fill = species_lump)) +
  geom_col(position = "dodge") +  # Dodge to see separate species
  scale_fill_manual(values = c("Pisaster ochraceus" = "#6A0DAD", "Pisaster giganteus" = "#FF8C00")) +
  labs(
    title = "Sea Star Abundance Trends Over Time",
    subtitle = "Fluctuations occur in Pisaster species, a genus of Sea Star",
    x = "Year",
    y = "Percentage of Total Counts",
    fill = "Species"
  ) +
  theme_ocean +
  theme(
    panel.grid.minor = element_blank(),
    axis.text = element_text(size = 12),
    axis.title = element_text(size = 14),
    plot.title = element_text(size = 16, face = "bold", hjust = 0.5)
  ) +
  facet_wrap(~species_lump, scales = "free_y", ncol = 1, nrow = 2)

Latitudinal Shift Map

pisaster_sum <- pisaster_sum 
pisaster_sum$lat_bin <- cut(pisaster_sum$latitude, 
                            breaks = seq(32, 42, 2),  
                            include.lowest = TRUE,
                            labels = c("32 - 33",
                                       "34 - 35",
                                       "36 - 37",
                                       "38 - 39",
                                       "40 - 41"))



pisaster_sum$year_bin <- cut(pisaster_sum$year, 
                             breaks = c(2001, 2005, 2009, 2013, 2017, 2021),  
                             include.lowest = TRUE, 
                             right = FALSE,
                             labels = c("2001-2004",
                                        "2005-2008",
                                        "2009-2012",
                                        "2013-2016",
                                        "2017-2020"))
# Maybe I can group a mean by the latitudnal bins then show how those shift on a plot of CA

ggplot(pisaster_sum, aes(x = latitude, y = year_bin, color = year_bin)) +
  geom_density_ridges_gradient(scale = 3, rel_min_height = 0.01, 
                                fill = NA) +
    scale_color_cyclical(name = "year_bin", guide = "legend",
                       values = c("#6A0DAD", "#FF8C00")) +
  theme_minimal(base_size = 14) +
  theme(
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    legend.position = "none",
    axis.text = element_text(size = 12),
    axis.title = element_text(size = 14),
    plot.title = element_text(size = 16, face = "bold", hjust = 0.5)
  ) +
  labs(title = "Latitudinal Shift of Pisaster Over 20 Years",
       x = "Latitude",
       y = "Year",
       fill = "Density") 

Abundance/Presence Plot

# Ensure presence is a factor and year_bin is in the right format
pisaster_sum$presence <- factor(pisaster_sum$presence, levels = c(0, 1), labels = c("Absence (0)", "Presence (1)"))
# Create the bar chart with proper grouping and fill aesthetics
ggplot(pisaster_sum, aes(x = year_bin, fill = presence, group = year_bin)) +
  geom_bar() +
  facet_wrap(~ presence, scales = "free_y") +  # Create separate plots for each year_bin
  labs(title = "Proportion of Sites with Presence vs Absence by Year", 
       x = "Presence/Absence", 
       y = "Count") +
  scale_fill_manual(values = c("Presence (1)" = "#6A0DAD", "Absence (0)" = "#FF8C00")) +
  theme_ocean +
  theme(
    panel.grid.minor = element_blank(),
    axis.text = element_text(size = 12),
    axis.title = element_text(size = 14),
    plot.title = element_text(size = 16, face = "bold", hjust = 0.5)
  )

7. Answer the following questions:

  1. What challenges did you encounter or anticipate encountering as you continue to build / iterate on your visualizations in R? If you struggled with mocking up any of your three visualizations (from #6, above), describe those challenges here.

For visualization 1, I need to learn how plot bars as images (aka starfishes). For visualization 2, I need to learn how to plot density ridges geographically, to represent species ranges. For visualization 3, I need to learn how change my bar plot to a line plot with the relevant data.

  1. What ggplot extension tools / packages do you need to use to build your visualizations? Are there any that we haven’t covered in class that you’ll be learning how to use for your visualizations?

In additon to base ggplot, I plan to use gganimate, which we have not gone over yet. I will show how latitudnal plots shift over time.

  1. What feedback do you need from the instructional team and / or your peers to ensure that your intended message is clear?

I think I would like more feedback on if my concepts are worth pursuing in the first place. Additionally, I wonder what fonts and colors would make this most visually appealing.